Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

ode shown below is for five fold cross validation for this data set,

V was a cross-validation vector, in which five integers from one

were randomly distributed. The vector test was used as the

of the testing data set. It was composed of one fold of the whole

The vector ‘-test’ was composed the indexes of the training

It was composed of four folds of the whole data set. The matrix

the class variable removed. The vector Z was a variable of

class labels.

ple(1:5,nrow(x),replace=TRUE)

x[,which(colnames(x)!=’class’)]

0,nrow(x))

ld in 1:5)

=which(CV==fold)

l=lda(class ~.,x[-test,])

st]=predict(model,input[test,])$class

ollowing code was used for running the Jackknife test for the data

e a pointer variable (n) was enumerated for every seed. When a

enumerated, the seed input[n,] was used as the testing data

other seeds x[-n,] were used as the training data set.

0,nrow(x))

in 1:nrow(x))

l=lda(class ~.,x[-n,])

=predict(model,input[n,])$class

3.4 shows two confusion matrices for the LDA models

g two generalisation test approaches for this seeds data set. One

d the K-fold cross-validation approach and the other employed

knife test. Two types of the generalisation test approaches ended

two models with a similar performance. Both models

ated the total accuracy around 89%.

e 3.16(a) shows the ROC curves for these two generalisation test

sing the ROCR package. Two AUC values were very close to each